Felipe Angelim
Tech Lead @ Mercado Libre
Core Dev @ Sktime
Creator/Dev @ Prophetverse
\[ \underbrace{P(\theta|X)}_{\text{Posterior}} = \frac{\overbrace{P(X|\theta)}^{\text{Likelihood}} \; \overbrace{P(\theta)}^{\text{Prior}}}{\underbrace{P(X)}_{\text{Evidence}}} \implies \quad \quad P(\theta|X) {\LARGE \propto} P(X|\theta) P(\theta) \]
\[ \alpha \sim N(0, 10) \\ \beta \sim N(0, 10) \\ \sigma \sim \text{HalfCauchy}(5) \\ Y_i \sim N(\alpha + \beta X_i, \sigma) \]
import numpyro
from numpyro import distributions as dist
def linear_regression_model(x, y=None):
# Priors on intercept and slope
alpha = numpyro.sample("alpha", dist.Normal(0.0, 10.0))
beta = numpyro.sample("beta", dist.Normal(0.0, 10.0))
# Prior on noise scale (sigma > 0)
sigma = numpyro.sample("sigma", dist.HalfCauchy(5.0))
# Likelihood
mean = alpha + beta * x
with numpyro.plate("data", x.shape[0]):
numpyro.sample("obs", dist.Normal(mean, sigma), obs=y)\[ \arg\max_\theta P(\theta | X) \]
Ridge regression:
\[ Y|X = X\beta + \epsilon, \quad \epsilon \sim N(0, \sigma^2 I) \\ \hat{\beta} = \arg\min_\beta \left\{ ||y - X\beta||^2 + \lambda ||\beta||^2 \right\} \]
Bayesian Ridge regression:
\[ Y | X \sim N(X\beta, \sigma^2 I) \\ \beta \sim N(0, \frac{I}{\tau^2}) \\ \hat{\beta} = \arg\max_\beta P(\beta | X, Y) \]
\[ P(\beta) \]
\[ G(t) = \frac{C_1(t-t_0) + C_2}{\left(1 + \exp(-\alpha v (t - t_0))\right)^{\frac{1}{v}}} \]
\[ C_2 \in \mathbb{R}_+ = \text{is the constant capacity term}\\ C_1 \in \mathbb{R}_+ = \text{is the linear increasing rate of the capacity}\\ t_0 \in \mathbb{R} = \text{is the time offset term}\\ v \in \mathbb{R}_+ = \text{determines the shape of the curve} \\ \alpha \in \mathbb{R} = \text{is the rate} \]
See more in custom trend tutorial in prophetverse
\[ E[\text{Sales} | \text{Marketing Channels}] = \text{trend} + \text{seasonality} + f_{\text{social_media}}(x_{\text{social_media}}) + f_{\text{email}}(x_{\text{email}}) + f_{\text{tv}}(x_{\text{tv}}) \]
Which of these is a Bayesian statement, and which is Frequentist?
A. There is a 95% probability that the true value \(\theta\) lies in my interval \([A, B]\).
B. There is 95% chance that my interval \([A, B]\) contains the true quantity \(\theta\).
If \([A, B]\) is an interval generated by a model, and \(\theta\) is the parameter of interest.
Answer:
A. (Bayesian): “There is a 95% probability that the true quantity \(\theta\) lies in \([A, B]\)” * Treats \(\theta\) as random, data as fixed. Probability statement about the parameter.
B. (Frequentist): “There is 95% chance that \([A, B]\) contains the true quantity \(\theta\)” * Treats \(\theta\) as fixed, data (and thus interval) as random. Statement about the procedure: if repeated many times, 95% of such intervals would capture the true \(\theta\).
The key difference: Bayesian credible intervals condition on the observed data. Frequentist confidence intervals consider the randomness of the data generation process.
Adapted from Jake VanderPlas. (Link to video).